智能论文笔记

Conformer Based Elderly Speech Recognition System for Alzheimer's Disease Detection

Tianzi Wang , Jiajun Deng , Mengzhe Geng , Zi Ye , Shoukang Hu , Yi Wang , Mingyu Cui , Zengrui Jin , Xunying Liu , Helen Meng

分类：机器学习

2022-06-23

阿尔茨海默氏病（AD）的早期诊断对于促进预防性护理以延迟进一步发展至关重要。本文介绍了建立在痴呆症Pitt copus上的基于最新的构象识别系统以自动检测的开发。通过纳入一组有目的设计的建模功能，包括基于域搜索的自动配置特异性构象异构体超参数除外，还包括基于速度扰动和基于规格的数据增强训练的基线构象体系统可显着改善。使用学习隐藏单位贡献（LHUC）的细粒度老年人的适应性；以及与混合TDNN系统的基于两次通行的跨系统逆转。在48位老年人的评估数据上获得了总体单词错误率（相对34.8％）的总体单词错误率（相对34.8％）。使用最终系统的识别输出来提取文本特征，获得了最佳的基于语音识别的AD检测精度为91.7％。

translated by 谷歌翻译

Personalized Adversarial Data Augmentation for Dysarthric and Elderly Speech Recognition

Zengrui Jin , Mengzhe Geng , Jiajun Deng , Tianzi Wang , Shujie Hu , Guinan Li , Xunying Liu

分类：人工智能

2022-05-13

尽管针对正常语音的自动语音识别（ASR）技术取得了迅速的进展，但迄今为止，准确认识违反障碍和老年语音仍然是高度挑战的任务。由于这些用户中经常发现的移动性问题，很难为ASR系统开发收集大量此类数据。为此，数据增强技术起着至关重要的作用。与现有的数据增强技术相反，仅修改光谱轮廓的说话速率或整体形状，使用一组新颖的扬声器依赖（SD）生成对抗网络（Gan ）本文基于数据增强方法。这些既可以灵活地允许：a）在可用的语音数据可用时修改时间或速度的正常语音光谱，并更接近受损说话者的扬声器； b）对于非平行数据，SVD分解了正常语音频谱基础特征，要转换为目标老年人说话者的特征，然后再与时间基础重组以生成最先进的TDNN的增强数据和构象体ASR系统培训。实验是针对四个任务进行的：英语Uapseech和Torgo违反语音语音Corpora；英国痴呆症皮特和广东话JCCOCC MOCA老年语音数据集。所提出的基于GAN的数据增强方法始终优于基线速度扰动方法，最多可在Torgo和Dementiabank数据上降低4.91％和3.0％的绝对速度（相对相对9.61％和6.4％）。应用基于LHUC的扬声器适应后，保留了一致的性能改进。

translated by 谷歌翻译

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Timothy T. Yu , Da Ma , Jayden Cole , Myeong Jin Ju , Mirza F. Beg , Marinko V. Sarunic

分类：人工智能 | 计算机视觉

2023-01-02

Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.

translated by 谷歌翻译

ReSQueing Parallel and Private Stochastic Convex Optimization

Yair Carmon , Arun Jambulapati , Yujia Jin , Yin Tat Lee , Daogao Liu , Aaron Sidford , Kevin Tian

分类：机器学习 | (统计)机器学习

2023-01-01

We introduce a new tool for stochastic convex optimization (SCO): a Reweighted Stochastic Query (ReSQue) estimator for the gradient of a function convolved with a (Gaussian) probability density. Combining ReSQue with recent advances in ball oracle acceleration [CJJJLST20, ACJJS21], we develop algorithms achieving state-of-the-art complexities for SCO in parallel and private settings. For a SCO objective constrained to the unit ball in $\mathbb{R}^d$, we obtain the following results (up to polylogarithmic factors). We give a parallel algorithm obtaining optimization error $\epsilon_{\text{opt}}$ with $d^{1/3}\epsilon_{\text{opt}}^{-2/3}$ gradient oracle query depth and $d^{1/3}\epsilon_{\text{opt}}^{-2/3} + \epsilon_{\text{opt}}^{-2}$ gradient queries in total, assuming access to a bounded-variance stochastic gradient estimator. For $\epsilon_{\text{opt}} \in [d^{-1}, d^{-1/4}]$, our algorithm matches the state-of-the-art oracle depth of [BJLLS19] while maintaining the optimal total work of stochastic gradient descent. We give an $(\epsilon_{\text{dp}}, \delta)$-differentially private algorithm which, given $n$ samples of Lipschitz loss functions, obtains near-optimal optimization error and makes $\min(n, n^2\epsilon_{\text{dp}}^2 d^{-1}) + \min(n^{4/3}\epsilon_{\text{dp}}^{1/3}, (nd)^{2/3}\epsilon_{\text{dp}}^{-1})$ queries to the gradients of these functions. In the regime $d \le n \epsilon_{\text{dp}}^{2}$, where privacy comes at no cost in terms of the optimal loss up to constants, our algorithm uses $n + (nd)^{2/3}\epsilon_{\text{dp}}^{-1}$ queries and improves recent advancements of [KLL21, AFKT21]. In the moderately low-dimensional setting $d \le \sqrt n \epsilon_{\text{dp}}^{3/2}$, our query complexity is near-linear.

translated by 谷歌翻译

Mapping smallholder cashew plantations to inform sustainable tree crop expansion in Benin

Leikun Yin , Rahul Ghosh , Chenxi Lin , David Hale , Christoph Weigl , James Obarowski , Junxiong Zhou , Jessica Till , Xiaowei Jia , Troy Mao

分类：计算机视觉 | 机器学习

2023-01-01

Cashews are grown by over 3 million smallholders in more than 40 countries worldwide as a principal source of income. As the third largest cashew producer in Africa, Benin has nearly 200,000 smallholder cashew growers contributing 15% of the country's national export earnings. However, a lack of information on where and how cashew trees grow across the country hinders decision-making that could support increased cashew production and poverty alleviation. By leveraging 2.4-m Planet Basemaps and 0.5-m aerial imagery, newly developed deep learning algorithms, and large-scale ground truth datasets, we successfully produced the first national map of cashew in Benin and characterized the expansion of cashew plantations between 2015 and 2021. In particular, we developed a SpatioTemporal Classification with Attention (STCA) model to map the distribution of cashew plantations, which can fully capture texture information from discriminative time steps during a growing season. We further developed a Clustering Augmented Self-supervised Temporal Classification (CASTC) model to distinguish high-density versus low-density cashew plantations by automatic feature extraction and optimized clustering. Results show that the STCA model has an overall accuracy of 80% and the CASTC model achieved an overall accuracy of 77.9%. We found that the cashew area in Benin has doubled from 2015 to 2021 with 60% of new plantation development coming from cropland or fallow land, while encroachment of cashew plantations into protected areas has increased by 70%. Only half of cashew plantations were high-density in 2021, suggesting high potential for intensification. Our study illustrates the power of combining high-resolution remote sensing imagery and state-of-the-art deep learning algorithms to better understand tree crops in the heterogeneous smallholder landscape.

translated by 谷歌翻译

MTNeuro: A Benchmark for Evaluating Representations of Brain Structure Across Multiple Levels of Abstraction

Jorge Quesada , Lakshmi Sathidevi , Ran Liu , Nauman Ahad , Joy M. Jackson , Mehdi Azabou , Jingyun Xiao , Christopher Liding , Matthew Jin , Carolina Urzay

分类：计算机视觉 | 机器学习

2023-01-01

There are multiple scales of abstraction from which we can describe the same image, depending on whether we are focusing on fine-grained details or a more global attribute of the image. In brain mapping, learning to automatically parse images to build representations of both small-scale features (e.g., the presence of cells or blood vessels) and global properties of an image (e.g., which brain region the image comes from) is a crucial and open challenge. However, most existing datasets and benchmarks for neuroanatomy consider only a single downstream task at a time. To bridge this gap, we introduce a new dataset, annotations, and multiple downstream tasks that provide diverse ways to readout information about brain structure and architecture from the same image. Our multi-task neuroimaging benchmark (MTNeuro) is built on volumetric, micrometer-resolution X-ray microtomography images spanning a large thalamocortical section of mouse brain, encompassing multiple cortical and subcortical regions. We generated a number of different prediction challenges and evaluated several supervised and self-supervised models for brain-region prediction and pixel-level semantic segmentation of microstructures. Our experiments not only highlight the rich heterogeneity of this dataset, but also provide insights into how self-supervised approaches can be used to learn representations that capture multiple attributes of a single image and perform well on a variety of downstream tasks. Datasets, code, and pre-trained baseline models are provided at: https://mtneuro.github.io/ .

translated by 谷歌翻译

An end-to-end multi-scale network for action prediction in videos

Xiaofa Liu , Jianqin Yin , Yuan Sun , Zhicheng Zhang , Jin Tang

分类：计算机视觉

2022-12-31

In this paper, we develop an efficient multi-scale network to predict action classes in partial videos in an end-to-end manner. Unlike most existing methods with offline feature generation, our method directly takes frames as input and further models motion evolution on two different temporal scales.Therefore, we solve the complexity problems of the two stages of modeling and the problem of insufficient temporal and spatial information of a single scale. Our proposed End-to-End MultiScale Network (E2EMSNet) is composed of two scales which are named segment scale and observed global scale. The segment scale leverages temporal difference over consecutive frames for finer motion patterns by supplying 2D convolutions. For observed global scale, a Long Short-Term Memory (LSTM) is incorporated to capture motion features of observed frames. Our model provides a simple and efficient modeling framework with a small computational cost. Our E2EMSNet is evaluated on three challenging datasets: BIT, HMDB51, and UCF101. The extensive experiments demonstrate the effectiveness of our method for action prediction in videos.

translated by 谷歌翻译

An Instrumental Variable Approach to Confounded Off-Policy Evaluation

Yang Xu , Jin Zhu , Chengchun Shi , Shikai Luo , Rui Song

分类： (统计)机器学习 | 机器学习

2022-12-29

Off-policy evaluation (OPE) is a method for estimating the return of a target policy using some pre-collected observational data generated by a potentially different behavior policy. In some cases, there may be unmeasured variables that can confound the action-reward or action-next-state relationships, rendering many existing OPE approaches ineffective. This paper develops an instrumental variable (IV)-based method for consistent OPE in confounded Markov decision processes (MDPs). Similar to single-stage decision making, we show that IV enables us to correctly identify the target policy's value in infinite horizon settings as well. Furthermore, we propose an efficient and robust value estimator and illustrate its effectiveness through extensive simulations and analysis of real data from a world-leading short-video platform.

translated by 谷歌翻译

OrthoGAN:High-Precision Image Generation for Teeth Orthodontic Visualization

Feihong Shen , JIngjing Liu , Haizhen Li , Bing Fang , Chenglong Ma , Jin Hao , Yang Feng , Youyi Zheng

分类：计算机视觉

2022-12-29

Patients take care of what their teeth will be like after the orthodontics. Orthodontists usually describe the expectation movement based on the original smile images, which is unconvincing. The growth of deep-learning generative models change this situation. It can visualize the outcome of orthodontic treatment and help patients foresee their future teeth and facial appearance. While previous studies mainly focus on 2D or 3D virtual treatment outcome (VTO) at a profile level, the problem of simulating treatment outcome at a frontal facial image is poorly explored. In this paper, we build an efficient and accurate system for simulating virtual teeth alignment effects in a frontal facial image. Our system takes a frontal face image of a patient with visible malpositioned teeth and the patient's 3D scanned teeth model as input, and progressively generates the visual results of the patient's teeth given the specific orthodontics planning steps from the doctor (i.e., the specification of translations and rotations of individual tooth). We design a multi-modal encoder-decoder based generative model to synthesize identity-preserving frontal facial images with aligned teeth. In addition, the original image color information is used to optimize the orthodontic outcomes, making the results more natural. We conduct extensive qualitative and clinical experiments and also a pilot study to validate our method.

translated by 谷歌翻译

OMSN and FAROS: OCTA Microstructure Segmentation Network and Fully Annotated Retinal OCTA Segmentation Dataset

Peng Xiao , Xiaodong Hu , Ke Ma , Gengyuan Wang , Ziqing Feng , Yuancong Huang , Jin Yuan

分类：计算机视觉

2022-12-26

The lack of efficient segmentation methods and fully-labeled datasets limits the comprehensive assessment of optical coherence tomography angiography (OCTA) microstructures like retinal vessel network (RVN) and foveal avascular zone (FAZ), which are of great value in ophthalmic and systematic diseases evaluation. Here, we introduce an innovative OCTA microstructure segmentation network (OMSN) by combining an encoder-decoder-based architecture with multi-scale skip connections and the split-attention-based residual network ResNeSt, paying specific attention to OCTA microstructural features while facilitating better model convergence and feature representations. The proposed OMSN achieves excellent single/multi-task performances for RVN or/and FAZ segmentation. Especially, the evaluation metrics on multi-task models outperform single-task models on the same dataset. On this basis, a fully annotated retinal OCTA segmentation (FAROS) dataset is constructed semi-automatically, filling the vacancy of a pixel-level fully-labeled OCTA dataset. OMSN multi-task segmentation model retrained with FAROS further certifies its outstanding accuracy for simultaneous RVN and FAZ segmentation.

translated by 谷歌翻译